Take-home Exercise 4 - Part 1

Decoding Chaos: Initial Data Exploratory Analysis (IDEA)

Author

Teo Suan Ern

Published

February 27, 2024

Modified

March 2, 2024

1. Overview


1.1 Project Brief

Take-home exercise 4 is the preliminary work of the final group project - Decoding Chaos. Armed conflicts due to political violence and coordinated attacks targeting innocent civilians, have been on the rise globally. This threatens the public at both physical and psychological levels. A good visual analysis of armed conflicts is essential to help (1) discover armed conflicts trends and (2) conceptualise armed conflict spaces.

The project team consists of three members, and each member will take one of the main prototype modules as follows:

  • Exploratory Data Analysis (Initial & Geospatial)

  • Spatial Point Pattern Analysis

  • Multivariate Clustering Analysis

1.2 Project Objectives

The project will be using open-source data from armed conflict events (Armed Conflict Location & Event Data Project (ACLED)). The objective of my assignment is to build a prototype - user interface (UI) design on Exploratory Data Analysis (Initial & Geospatial) that provides easy-to-use and insightful visualisation tools that can be suitable for Defence and Security sectors to develop effective counter measures and strategies.

1.3 Exploratory Data Analysis

This assignment is separated into three segments (web pages):

  1. Initial Data Exploratory Analysis (IDEA) – Current Page
  2. Geospatial Data Exploratory Analysis (GDEA) – Click here for GDEA page.
  3. Prototype: Exploratory – Click here for Prototype: Exploratory page.

2. Initial Data Preparation


2.1 Install and launch R packages

The project uses p_load() of pacman package to check if the R packages are installed in the computer.

The following code chunk is used to install and launch the R packages.

Show code
pacman::p_load(tidyverse, kableExtra, knitr, highcharter, scales, 
               ggthemes, RColorBrewer, lubridate, wordcloud, tidytext,
               ggforce, ggraph, igraph, visNetwork, tm, plotly)
  • tidyverse: a family of modern R packages specially designed to support data science, analysis and communication task including creating static statistical graphs.

  • knitr: an report generation tool.

  • ggthemes: an R package that provides extra themes, geoms and scales to ggplot2 package.

  • DT: an R interface to the JavaScript library DataTables that create interactive table on html page.

  • plotly: an R package for creating interactive charts.

  • scales: an scale package used for controlling axis and legend labels.

  • lubridate: an R package that facilitates to use of dates and time elements.

  • wordcloud: a text mining package and word cloud generator.

  • tidytext: an R package that provides functions and supports seamless conversions of text to and from tidy formats of datasets.

  • tm: an R package that provides text mining applications.

  • ggforce: an extension of ggplot2 to provide visual data analysis with newer stats and geoms.

  • ggraph: build network graph visualisation using appropriate functions

  • igraph: an interface for analysis of graphs or networks.

  • visNetwork: an R package for interactive network visualisation.

  • highcharter: a wrapper that contains ‘highcharts’ library for plotting of R objects.

2.2 Import Data

The project will examine the dataset from Armed Conflict Location & Event Data Project (ACLED), specifically Myanmar country, between Year 2010 and Year 2023.

Show code
data <- read.csv("data/1900-01-01-2024-02-26-Southeast_Asia-Myanmar.csv")

2.3 Overview of the data

The combined data consists of 55,574 observations and 35 variables. Each row details the armed conflict event on the type, agents, location, date and other characteristics of conflict events (such as political violence, demonstration) in Myanmar.

Dataset Structure

Use str() to check the structure of the data.

str(data)
'data.frame':   55574 obs. of  35 variables:
 $ event_id_cnty     : chr  "MMR56099" "MMR56222" "MMR56370" "MMR56376" ...
 $ event_date        : chr  "31-Dec-23" "31-Dec-23" "31-Dec-23" "31-Dec-23" ...
 $ year              : int  2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
 $ time_precision    : int  1 1 1 1 1 1 1 1 1 1 ...
 $ disorder_type     : chr  "Political violence" "Political violence" "Political violence" "Demonstrations" ...
 $ event_type        : chr  "Explosions/Remote violence" "Explosions/Remote violence" "Battles" "Protests" ...
 $ sub_event_type    : chr  "Shelling/artillery/missile attack" "Shelling/artillery/missile attack" "Armed clash" "Peaceful protest" ...
 $ actor1            : chr  "Military Forces of Myanmar (2021-)" "Military Forces of Myanmar (2021-)" "Phoenix DF: Phoenix Defense Force (Nattalin)" "Protesters (Myanmar)" ...
 $ assoc_actor_1     : chr  "" "" "" "" ...
 $ inter1            : int  1 1 3 6 1 1 3 1 2 1 ...
 $ actor2            : chr  "" "Civilians (Myanmar)" "Military Forces of Myanmar (2021-)" "" ...
 $ assoc_actor_2     : chr  "" "" "" "" ...
 $ inter2            : int  0 7 1 0 7 0 1 0 1 7 ...
 $ interaction       : int  10 17 13 60 17 10 13 10 12 17 ...
 $ civilian_targeting: chr  "" "Civilian targeting" "" "" ...
 $ iso               : int  104 104 104 104 104 104 104 104 104 104 ...
 $ region            : chr  "Southeast Asia" "Southeast Asia" "Southeast Asia" "Southeast Asia" ...
 $ country           : chr  "Myanmar" "Myanmar" "Myanmar" "Myanmar" ...
 $ admin1            : chr  "Mon" "Rakhine" "Bago-West" "Sagaing" ...
 $ admin2            : chr  "Mawlamyine" "Maungdaw" "Thayarwady" "Yinmarbin" ...
 $ admin3            : chr  "Ye" "Maungdaw" "Nattalin" "Salingyi" ...
 $ location          : chr  "Aing Shey" "Kaing Gyi (NaTaLa)" "Kyauk Pyoke" "Let Pa Taung" ...
 $ latitude          : num  15.3 20.7 18.6 22.1 18.6 ...
 $ longitude         : num  98 92.4 95.8 95.1 95.8 ...
 $ geo_precision     : int  1 2 2 2 1 1 1 2 2 1 ...
 $ source            : chr  "Democratic Voice of Burma" "Development Media Group; Narinjara News" "Khit Thit Media; Myanmar Pressphoto Agency" "Myanmar Labour News" ...
 $ source_scale      : chr  "National" "Subnational" "National" "National" ...
 $ notes             : chr  "On 31 December 2023, in Aing Shey village (Ye township, Mawlamyine district, Mon state), following a clash betw"| __truncated__ "On 31 December 2023, in Kaing Gyi (Mro) village (coded as Kaing Gyi (NaTaLa)) (Maungdaw township, Maungdaw dist"| __truncated__ "On 31 December 2023, near Kyauk Pyoke village (Nattalin township, Thayarwady district, Bago-West region), the P"| __truncated__ "On 31 December 2023, in the Let Pa Taung area of Salingyi township (Yinmarbin district, Sagaing region), protes"| __truncated__ ...
 $ fatalities        : int  0 0 4 0 0 0 3 0 0 0 ...
 $ tags              : chr  "" "" "" "crowd size=no report" ...
 $ timestamp         : int  1704831212 1704831213 1704831214 1704831214 1704831214 1704831216 1704831216 1704831216 1704831216 1704831216 ...
 $ population_1km    : int  NA NA NA 749 NA 178 6634 671 687 35292 ...
 $ population_2km    : int  NA NA NA 521 NA 135 19078 2197 654 85732 ...
 $ population_5km    : int  3081 NA NA 1358 NA NA 34396 3144 656 169473 ...
 $ population_best   : int  3081 NA NA 749 NA NA 34396 3144 656 85732 ...

The output above reveals that event_date is in character format instead of date format.

Use colSums to check for missing values

The output below shows that there are three variables with missing values.

Show code
# check missing values
missing_values <- colSums(is.na(data)) 

missing_values_only <- missing_values[missing_values > 0]

missing_values_only %>% kable() 
x
population_1km 9827
population_2km 9996
population_5km 10318
population_best 20848

Use duplicate() to check for duplicates:

There is no duplicate entries in the dataset.

data[duplicated(data),]
 [1] event_id_cnty      event_date         year               time_precision    
 [5] disorder_type      event_type         sub_event_type     actor1            
 [9] assoc_actor_1      inter1             actor2             assoc_actor_2     
[13] inter2             interaction        civilian_targeting iso               
[17] region             country            admin1             admin2            
[21] admin3             location           latitude           longitude         
[25] geo_precision      source             source_scale       notes             
[29] fatalities         tags               timestamp          population_1km    
[33] population_2km     population_5km     population_best   
<0 rows> (or 0-length row.names)

3. Data Wrangling


The flowchart diagram below provides an overview of the key variables used in this project.

flowchart TD
  A(Key Variables Used \n event_id_cnty)
  A --> B(Time Period)
  A --> C(Characteristic of Incident)
  A --> D(Location)
  
  B --> E(year)
  B --> F(date)
  B -.-> G(New Variables)
  G -.-> H(day)
  G -.-> I(week number)
  G -.-> J(month)


  C --> K(event_type)
  C --> L(sub_event_type)
  C --> M(actor1)
  C --> N(actor2)
  C --> O(fatalities)
  C -.-> P(New Variables)
  P -.-> Q(total incidents)
  P -.-> R(total fatalities)
  P -.-> S(political violence rate)
  P -.-> T(violence against civilian rate)
  P -.-> U(territory exchange rate \n-non-state exchange)
  P -.-> V(territory exchange rate \n-government regains territory)
  
  
  D --> W(country)
  D --> X(longitude)
  D --> Y(latitude)
  D --> Z(admin1)
  D --> AA(admin2)
  D --> AB(admin3)
  D -.-> AC(New Variables)
  AC -.-> AD(geometry points)
  AC -.-> AE(shapeID)

3.1 Convert event_date format

The code chunk below uses dmy() convert to date format from character to date format:

Show code
data$event_date <- dmy(data$event_date)

3.2 Create new variables

The code chunk below creates the following new variables based on total armed conflict incidents and total fatalities (by disorder_type and sub_event_type):

  • Annual percentage of political violence

  • Annual percentage of violence against civilian

  • Annual percentage of government regains territory

  • Annual percentage of non-state actor overtakes territory

Show code
data2 <- data %>%
  filter(fatalities > 0) %>%
  group_by(year) %>%
  mutate(
    total_fata = sum(fatalities),
    
    total_inci = n(),
    
    ## incidents
    # Political violence rates
    political_rate = round(
      sum(total_inci[event_type %in% c("Battles", "Protests", "Explosions/Remote violence", "Violence against civilians")]) /
        sum(total_inci) * 100),
    
    # Violence against civilian rates
    civilian_rate = round(
      sum(total_inci[event_type == "Violence against civilians"]) / 
        sum(total_inci) * 100),
    
    # Exchange of territory
    non_state_exchange = round(
      sum(total_inci[sub_event_type == "Non-state actor overtakes territory"]) /
        sum(total_inci) * 100),
      
    govt_regain_exchange = round(
      sum(total_inci[sub_event_type == "Government regains territory"]) / 
        sum(total_inci) * 100),
    
    
    ## fatalities
    # Political violence rates
    political_rate = round(
      sum(total_fata[event_type %in% c("Battles", "Protests", 
                                       "Explosions/Remote violence", 
                                       "Violence against civilians")]) /
        sum(total_fata) * 100, 2),
    
    # Violence against civilian rates
    civilian_rate = round(
      sum(total_fata[event_type == "Violence against civilians"]) / 
        sum(total_fata) * 100, 2),
    
    # Exchange of territory
    non_state_exchange = round(
      sum(total_fata[sub_event_type == "Non-state actor overtakes territory"]) /
        sum(total_fata) * 100, 2),
      
    govt_regain_exchange = round(
      sum(total_fata[sub_event_type == "Government regains territory"]) / 
        sum(total_inci) * 100, 2)
    
  ) %>%
  ungroup()

3.3 Filter data columns

The code chunk below selects/ excludes the variables intended to be used for this project.

Show code
final <- data2 %>%
  select(-time_precision, -geo_precision, -source_scale, -timestamp, -tags, 
         -population_1km, -population_2km, -population_5km, -population_best)

The code chunk below save dataset in .rds format for subsequent geospatial EDA.

Show code
write_rds(final, 
          "data/final.rds")

Use str() to check the structure of the final dataset.

str(final)
tibble [13,177 × 32] (S3: tbl_df/tbl/data.frame)
 $ event_id_cnty       : chr [1:13177] "MMR56370" "MMR56871" "MMR56878" "MMR56900" ...
 $ event_date          : Date[1:13177], format: "2023-12-31" "2023-12-31" ...
 $ year                : int [1:13177] 2023 2023 2023 2023 2023 2023 2023 2023 2023 2023 ...
 $ disorder_type       : chr [1:13177] "Political violence" "Political violence" "Political violence" "Political violence" ...
 $ event_type          : chr [1:13177] "Battles" "Battles" "Explosions/Remote violence" "Battles" ...
 $ sub_event_type      : chr [1:13177] "Armed clash" "Armed clash" "Air/drone strike" "Armed clash" ...
 $ actor1              : chr [1:13177] "Phoenix DF: Phoenix Defense Force (Nattalin)" "MSRF: Mon State Revolutionary" "Military Forces of Myanmar (2021-)" "MSPDF: Myaung Special People's Defense Force" ...
 $ assoc_actor_1       : chr [1:13177] "" "Daw Na Column; YGF: Ye Guerrilla Force; ABSDF: All Burma Students' Democratic Front; MSRU: Mon State Revolution"| __truncated__ "" "NRDF: Natogyi Regional Defense Force; People's Defense Force - Meiktila District; People's Defense Force - Mony"| __truncated__ ...
 $ inter1              : int [1:13177] 3 3 1 3 3 1 3 1 2 3 ...
 $ actor2              : chr [1:13177] "Military Forces of Myanmar (2021-)" "Military Forces of Myanmar (2021-)" "Civilians (Myanmar)" "Military Forces of Myanmar (2021-)" ...
 $ assoc_actor_2       : chr [1:13177] "" "" "" "Police Forces of Myanmar (2021-)" ...
 $ inter2              : int [1:13177] 1 1 7 1 1 7 1 7 1 1 ...
 $ interaction         : int [1:13177] 13 13 17 13 13 17 13 17 12 13 ...
 $ civilian_targeting  : chr [1:13177] "" "" "Civilian targeting" "" ...
 $ iso                 : int [1:13177] 104 104 104 104 104 104 104 104 104 104 ...
 $ region              : chr [1:13177] "Southeast Asia" "Southeast Asia" "Southeast Asia" "Southeast Asia" ...
 $ country             : chr [1:13177] "Myanmar" "Myanmar" "Myanmar" "Myanmar" ...
 $ admin1              : chr [1:13177] "Bago-West" "Mon" "Sagaing" "Sagaing" ...
 $ admin2              : chr [1:13177] "Thayarwady" "Mawlamyine" "Katha" "Monywa" ...
 $ admin3              : chr [1:13177] "Nattalin" "Ye" "Tigyaing" "Chaung-U" ...
 $ location            : chr [1:13177] "Kyauk Pyoke" "Kyaung Ywar" "Kan Pauk" "Chaung-U" ...
 $ latitude            : num [1:13177] 18.6 15.3 23.9 22 21.3 ...
 $ longitude           : num [1:13177] 95.8 98 96.1 95.3 95.4 ...
 $ source              : chr [1:13177] "Khit Thit Media; Myanmar Pressphoto Agency" "Democratic Voice of Burma; Khit Thit Media; Myanmar Pressphoto Agency" "Democratic Voice of Burma; Khit Thit Media; Myanmar Pressphoto Agency; Radio Free Asia" "Khit Thit Media; Myanmar Pressphoto Agency" ...
 $ notes               : chr [1:13177] "On 31 December 2023, near Kyauk Pyoke village (Nattalin township, Thayarwady district, Bago-West region), the P"| __truncated__ "On 31 December 2023, in Kyaung Ywar village (Ye township, Mawlamyine district, Mon state), a combined force of "| __truncated__ "On 31 December 2023, in Kan Pauk village (Tigyaing township, Katha district, Sagaing region), the Myanmar milit"| __truncated__ "On 31 December 2023, in Chaung-U town (Chaung-U township, Monywa district, Sagaing region), a combined force of"| __truncated__ ...
 $ fatalities          : int [1:13177] 4 3 1 1 2 1 3 1 1 4 ...
 $ total_fata          : int [1:13177] 15716 15716 15716 15716 15716 15716 15716 15716 15716 15716 ...
 $ total_inci          : int [1:13177] 4054 4054 4054 4054 4054 4054 4054 4054 4054 4054 ...
 $ political_rate      : num [1:13177] 99.5 99.5 99.5 99.5 99.5 ...
 $ civilian_rate       : num [1:13177] 23.3 23.3 23.3 23.3 23.3 ...
 $ non_state_exchange  : num [1:13177] 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 ...
 $ govt_regain_exchange: num [1:13177] 0 0 0 0 0 0 0 0 0 0 ...

4. Initial Exploratory Data Analysis


4.1 Descriptive Statistics

Before proceeding with data visualisation, it is essential to be able to navigate the dataset of 13,177 observations and 30 variables with ease. This segment will help users identify or navigate through the dataset observations instead of scrolling through each observation one-by-one. The interactive datatable is created using DT package.

Design Features - Interactive Data Table
  • Display number of observations by selecting the dropdown (5, 10, 25, 50, 100 entries). This ensure that the observations will not span across the entire webpage.

  • View other pages of observations with “previous” or “next” button.

  • Search specific observations with the search bar for the occurrence of a string/ numerical value in any column of an observation

  • Filter observations with the filter bar directly below column headers.

  • Column visibility allows user to select the columns that they are interested to view and hide the rest

Show code
DT::datatable(
  final, 
  class = "compact",
  filter = "top", 
  extensions = c("Buttons"),
  options = list(
    pageLength = 5,
    columnDefs = list(
      list(targets = c(1:23, 26:32), className = "dt-center"), # text align center
      list(targets = c(24, 25), visible = FALSE)
    ),
    buttons = list(
      list(extend = "colvis", columns = c(1:30))
      ),

    dom = "Bpiltf"
  ),
  caption = "Table 1:"
)

4.2 Distribution Analysis

Distribution of armed conflicts over the years in Myanmar’s Sub-national Administrative Region 1

The dataset consists of a variable called admin1, that is the largest sub-national administrative region in which the armed conflict event took place. The team hopes to visualise the distribution of armed conflicts at administrative region-levels and determine if the events are spread out across the years, started/ ended in certain years or even highly concentrated in certain years. A jittered-cum-boxplot is used to show the frequency of the conflicts events and how it has changed over the years.

Design Features
  • The prototype proposes to include the following interactivity elements for users’ data exploratory:
    • Dropdown filters such as event_type
    • Radio button selection on total armed conflicts or total fatalities
    • Slider bar to select the years
    • Checkbox selection to filter/ select by sub-national administrative region 1
  • forcats::fct_infreq is used to assign frequency values to factor levels while visualising it over time period.
  • geom_boxplot() with the use of ggplotly provides statistical information such as minimum, maximum, mean, median, first-and-third-quantile values when hover-over.
  • geom_sina() is useful for plotting single variable in a multiclass dataset to show density distribution within each class.
Show code
box1 <- ggplot(final, aes(x = forcats::fct_infreq(admin1), y = event_date, 
                          color = factor(admin1), fill = factor(admin1))) +
  geom_sina(method = "density", alpha = .3) +
  geom_boxplot(width = .2, color = "#000000", fill = NA, size = .5, 
               outlier.shape = NA, position = position_nudge(.25)) +
  coord_flip()+
  theme(legend.position = "none", 
        plot.title.position = "plot") +
  labs(title = "Frequency of Conflict Has Increased Over Time in Most Administrative Regions", subtitle = "Year 2010 to Year 2023") +
  labs(y = "Year (2010-2023)",
       x = "Adminstrative Region 1", 
       caption = "Data Source: ACLED (2023)")

ggplotly(box1)

Distribution of armed conflicts over the years in Myanmar based on Event Types

The dataset consists of a variable called event_type, that recorded the nature of event of the armed conflict. The team hopes to visualise the distribution of all events and determine if the events are spread out across the years, started/ ended in certain years or even highly concentrated in certain years. A jittered-cum-boxplot is used to show the frequency of the conflicts events and how it has changed over the years.

Show code
box2 <- ggplot(final, aes(x = forcats::fct_infreq(event_type), y = event_date, 
                          color = factor(event_type), fill = factor(event_type))) +
  geom_sina(method = "density", alpha = .3) +
  geom_boxplot(width = .2, color = "#000000", fill = NA, size = .5, 
               outlier.shape = NA, position = position_nudge(.25)) +
  coord_flip()+
  theme(legend.position = "none", 
        plot.title.position = "plot") +
  labs(title = "Battles, Explosion & Violence against Civilian Have Been Happening in Myanmar Over Time\n With More Occurrence Happening From Year 2020 Onwards", subtitle = "Year 2010 to Year 2023") +
  labs(y = "Year (2010-2023)",
       x = "Event Types", 
       caption = "Data Source: ACLED (2023)")

ggplotly(box2)

4.3 Timeseries Analysis

Calendar visualisation of armed conflicts and fatalities

The dataset consists of a variable called event_date, that recorded the date an armed conflict incident took place. The team hopes to make use of a calendar heatmap to visualise the number of incidents and fatalities that occurred on a daily basis to identify patterns or anomalies in Myanmar.

Design Features
  • The prototype proposes to include the following interactivity elements for users’ data exploratory:
    • Dropdown filters on years
  • geom_boxplot() with the use of ggplotly provides statistical information such as minimum, maximum, mean, median, first-and-third-quantile values when hover-over.
  • geom_sina() is useful for plotting single variable in a multiclass dataset to show density distribution within each class.

The code chunk below derives new variables by using weekdays(), mday(), months() and isoweek().

Show code
calendar <- final %>%
  filter(fatalities > 0) %>%
  group_by(year, event_date, admin1) %>%
  mutate(
    wkday = wday(event_date),
    day = mday(event_date),
    month = factor(months(event_date), levels = rev(month.name)),
    week = isoweek(event_date),
    year_month = format(zoo::as.yearmon(event_date), "%y-%m")
  ) %>%
  ungroup()


# selected years
# years <- c(2010:2023)
years <- 2023

cal_conflict <- calendar %>%
  group_by(year, day, month) %>%
  filter(year == years) %>%
  summarise(total_fata = sum(fatalities),
            total_inci = n()) %>%
  ungroup()
Show code
# tooltip
tooltip_heat <- paste("<b>", cal_conflict$day, " ", cal_conflict$month, " ", cal_conflict$year, "</b>", 
                      "\nFatalities : ", cal_conflict$total_fata,
                      "\nIncidents : ", cal_conflict$total_inci)

heat <- ggplot(cal_conflict, aes(x = day, y = month, fill = total_fata)) + 
  geom_tile(color = "white", size = 1, aes(text = tooltip_heat)) + 
  theme_tufte(base_family = "Helvetica") + 
  coord_equal() +
  scale_fill_gradient(name = "Total Fatalities", low = "#fff2f4", high = "lightcoral") +
  labs(x = "Days of Month", 
       y = "", 
       title = paste("Fatalities due to Armed Conflicts in Myanmar in ", years),
       caption = "Data Source: ACLED (2023)") +
  theme(axis.ticks = element_blank(),
        axis.text.x = element_text(size = 7),
        plot.title = element_text(hjust = 0.5),
        legend.title = element_text(size = 8),
        legend.text = element_text(size = 6),
        legend.position = "top") +
  scale_x_continuous(breaks = seq(min(cal_conflict$day), max(cal_conflict$day), by = 2),
                     labels = seq(min(cal_conflict$day), max(cal_conflict$day), by = 2)) 
Show code
# Convert ggplot to plotly (to include custom tooltip)
heat_plotly <- ggplotly(heat, tooltip = "text")

heat_plotly

Trend of armed conflicts and fatalities in Myanmar

The dataset consists of a variable called fatalities, that is the number of reported fatalities arising from the armed conflict event.

The team hopes to make use of a line chart to visualise the trend in the number of incidents and fatalities in Myanmar.

Design Features
  • The prototype proposes to include the following interactivity elements for users’ data exploratory:
    • Dropdown filters such as event_type
    • Radio button selection on total armed conflicts or total fatalities
    • Slider bar to select the years
    • Checkbox selection to filter/ select by sub-national administrative region 1
  • Information such as year, count of armed conflicts and fatalities will be available when hover-over.
Show code
year_fata <- calendar %>%
  filter(fatalities > 0) %>%
  group_by(year_month) %>%
  select(year, month, year_month, fatalities) %>%
  summarise(total_fata = sum(fatalities),
            total_inci = n()) %>%
  ungroup()

hc_plot1 <-  highchart() %>% 
  hc_add_series(year_fata, hcaes(x = year_month, y = total_fata), type = "line", 
                name = "Total Fatalities", color = "lightcoral") %>%
    hc_add_series(year_fata, hcaes(x = year_month, y = total_inci), type = "line", 
                name = "Total Incidents", color = "black") %>%
  hc_tooltip(crosshairs = TRUE, borderWidth = 1.5, headerFormat = "", 
             backgroundColor = "#FCFFC5",
             borderWidth = 5,
             pointFormat = "<b>20{point.year_month}</b> 
                                 <br> Fatalities: <b>{point.total_fata}</b>
                                 <br> Incidents: <b>{point.total_inci}</b>"
             ) %>%
  hc_title(text = "Armed Conflict Over The Years") %>% 
  hc_subtitle(text = "2010 to 2023") %>%
  hc_xAxis(title = list(text = "2010-2023"), labels = list(enabled = FALSE)) %>%
  hc_yAxis(title = list(text = "Frequency"),
           allowDecimals = FALSE,
           plotLines = list(list(
             color = "lightcoral", width = 1, dashStyle = "Dash",
             value = mean(year_fata$total_fata),
             label = list(text = paste("Average Monthly Fatalities:", round(mean(year_fata$total_fata))),
             style = list(color = 'lightcoral', fontSize = 20))))) %>% 
  hc_add_theme(hc_theme_flat())
hc_plot1

4.4 Text Analysis

Visual display of incident summary over the years

The dataset consists of a variable called notes, that recorded a short description of the event. Given the volume of incidents that happened on a daily basis, the team hopes to make use of generating a word cloud, to have a quick visual summary on the keywords recorded, and identify possible changes in the incidents that happened over the years in Myanmar.

Design Features
  • The prototype proposes to include the following interactivity elements for users’ data exploratory:
    • Dropdown filters on years. This allows quick visual summary on the keywords used in all armed conflict incidents of each year.
Show code
cloudtext1 <- final %>%
    select(year, notes)

cloudtext2 <- function(year) {
  subset_data <- cloudtext1 %>%
    filter(notes != "")
  
  docs <- Corpus(VectorSource(subset_data$notes))
  docs <- tm_map(docs, removeNumbers)
  docs <- tm_map(docs, removeWords, stopwords("english"))
  docs <- tm_map(docs, removePunctuation)
  docs <- tm_map(docs, stripWhitespace)
  docs <- tm_map(docs, stemDocument)
  
  dtm <- TermDocumentMatrix(docs)
  m <- as.matrix(dtm)
  v <- sort(rowSums(m), decreasing = TRUE)
  d <- data.frame(word = names(v), freq = v)
  
  wordcloud(d$word, d$freq, colors = brewer.pal(9, "Set3"), random.order = FALSE, rot.per = 0)
  title(main = paste("Incident Summary -", year), font.main = 1, col.main = "black", cex.main = 1.5)
}
Show code
# selected years
# years <- c(2010:2023)

years <- 2023
# Word clouds for each year
cloud_patch <- lapply(years, cloudtext2)

4.5 Network Analysis

The dataset consists of variables actor1, actor2, assoc_actor1 and assoc_actor2. The variables are main groups that initiated the armed conflict events as well as associated groups involved alongside with main groups.

There are multiple actors (1,364 unique groups) that have created conflicts in Myanmar. The team hopes to make use of a network graph to schematically depict the nodes and connections amongst the different actors and their associations.

Design Features
  • The prototype proposes to include the following interactivity elements for users’ data exploratory:
    • Dropdown filters on years. This allows quick visual summary of the network association of the actors in each year.
Show code
# calculate frequency
conflict_count = final %>% 
  group_by(actor1) %>% 
  summarise(frequency = n()) %>%
  arrange(desc(frequency)) 

conflict_count2 = final %>% 
  group_by(actor2) %>% 
  summarise(frequency = n()) %>%
  arrange(desc(frequency))

colnames(conflict_count) = c("actor","Freq")
colnames(conflict_count2) = c("actor","Freq")

# combine both actor 1 & 2
final_conflict_count = rbind(conflict_count,conflict_count2)

final_conflict_count2 = final_conflict_count %>%
  group_by(actor) %>%
  summarise(FrequencyConflicts = sum(Freq)) %>%
  arrange(desc(FrequencyConflicts))

# trim actor/ assoc_actor
final$actor1 = trimws(str_replace(final$actor1, "[Õ]", ""))
final$actor2  = trimws(str_replace(final$actor2, "[Õ]", ""))
final$assoc_actor_1 = trimws(str_replace(final$assoc_actor_1, "[Õ]", ""))
final$assoc_actor_2 = trimws(str_replace(final$assoc_actor_2, "[Õ]", ""))

assoc1 = final %>%
  filter(!is.na(actor1)) %>%
  filter(!is.na(assoc_actor_1)) %>%
  filter(!(actor1 == "")) %>%
  filter(!(assoc_actor_1 == "")) %>%
  select(actor1,assoc_actor_1)

assoc2 = final %>%
  filter(!is.na(actor2)) %>%
  filter(!is.na(assoc_actor_2)) %>%
  filter(!(actor2 == "")) %>%
  filter(!(assoc_actor_2 == "")) %>%
  select(actor2,assoc_actor_2)

colnames(assoc1) = c("actor","assoc_actor") 
colnames(assoc2) = c("actor","assoc_actor") 

# combine both assoc 1 & 2
combined1 = rbind(assoc1,assoc2)

combined2 = combined1 %>%
  group_by(actor,assoc_actor) %>%
  tally(sort = TRUE) 


final_conflict_count3 = trimws(final_conflict_count2$actor)

combined3 = combined2 %>%
  filter(actor %in% final_conflict_count3)


# network graph
viz_actors_12 <- function(actors_12) {
  set.seed(2016)
  a <- grid::arrow(type = "closed", length = unit(.15, "inches"))
  
  actors_12 %>%
    graph_from_data_frame() %>%
    ggraph(layout = "fr") +
    geom_edge_link(aes(edge_alpha = n, edge_width = n), edge_colour = "lightcoral") +
    geom_node_point(size = 5) +
    geom_node_text(aes(label = name), repel = TRUE,
                   point.padding = unit(0.5, "lines")) +
    theme_void()
}


combined3 %>%
  filter(n >= 100) %>%
  viz_actors_12

5. Geospatial Exploratory Data Analysis


Geospatial Exploratory Data Analysis can be found via link here.

This segment has been separated on a standalone web page due to Quarto rendering capacity.

6. Prototype: Exploratory


Prototype: Exploratory can be found via link here.

This segment has been separated on a standalone web page due to Quarto rendering capacity.

Reference

Back to top